Maximum entropy and the problem of moments: A stable algorithm 
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We present a technique for entropy optimization to calculate a distribution from its moments. 
The technique is based upon maximizing a discretized form of the Shannon entropy functional by 
mapping the problem onto a dual space where an optimal solution can be constructed iteratively. 
We demonstrate the performance and stability of our algorithm with several tests on numerically 
difficult functions. We then consider an electronic structure application, the electronic density of 
states of amorphous silica and study the convergence of Fermi level with increasing number of 
moments. 
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One of the fixed themes of physics is the solution of 
inverse problems. A ubiquitous example in theoreti- 
cal physics is the "Classical Moment Problem" (CMP), 
in which only a finite set of power moments of a non- 
negative distribution function p is known, and the full 
distribution is needed jljj. It is obvious that the solu- 
tion for p is not unique for a finite set of moments. This 
non-uniqueness suggests the need for a "best guess" for 
p, based upon the available information. With its ul- 
timate roots in nineteenth century statistical mechanics 
and a subsequent strong justification based upon proba- 
bility theory, the "maximum entropy" (maxent) method 
has provided an extremely successful variational princi- 
ple to address this type of inverse problem Collins 
and Wragg used the maxent method to solve the CMP 
for a modest number of moments In a comprehen- 
sive paper with seminal applications, Mead and Papani- 
colaou [J| solved the CMP with maximum entropy tech- 
niques and proposed the first practical numerical scheme 
to solve the moment problem for up to 15 moments. In 
a host of subsequent papers, the utility of the method 
as an unbiased and surprisingly efficient (rapidly con- 
vergent) solution of the CMP has been established. The 
principle has been used extensively in a number of diverse 
applications ranging from image construction to spectral 
analysis, large-scale electronic structure problems [^|^, 
series extrapolation and analytic continuation 0, quan- 
tum electronic transport ligand-binding distribution 
in polymers 0, and transport planning |lO|. 

There exist a number of maximum entropy algo- 
rithms 01 S Ell 12, 1^ that have been developed over 
the last two decades. Many of the algorithms (but not 
all) are constrained by the number of moments that it 
can deal with and become unreliable when the number of 
constraints exceeds a problem-dependent upper limit. As 
the number of moments increases, the calculation of mo- 
ments (particularly the power moments) becomes more 
sensitive to machine precision and the optimization prob- 
lem becomes ill-conditioned. It has been observed that 



implementation of a maxent algorithm with more than 
20 power moments is notoriously difficult even with ex- 
tended precision arithmetic and it rarely gives any further 
information on the nature of the distribution. The use 
of orthogonal polynomials as basis set significantly im- 
proves the accuracy and remedies most of the problems 
that one encounters with power moments. 

In this paper we present an iterative approach to con- 
struct the maxent solution of CMP, which is based upon 
discretization of the Shannon entropy functional |l4| . 
The essential idea is to discretize Shannon entropy and 
map the problem from the primal space onto dual space 
where an optimal solution can be constructed iteratively 
without the need of matrix inversion. We discuss the- 
oretical ideas and develop algorithms that can be used 
with both power and Chebyshev moments. The stability 
and the accuracy of the method are discussed with refer- 
ence to two numerically non-trivial examples - a uniform 
distribution and a double-delta function. Wc illustrate 
the usefulness of our technique by computing the elec- 
tronic density of states (EDOS) of amorphous silica with 
particular emphasis on convergence of the Fermi level as 
a function of number of moments. 

The starting point of our approach is to use a dis- 
cretized form of the Shannon entropy functional 
using a quadrature formula 



p{x) \Tip(x)dx 
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Here Wj and Xj are the weights and abscissas of any ac- 
curate quadrature formula, say the Gauss-Legendre and 
without any loss of generality we restrict ourselves to x € 
[0,1]. We want to maximize S subject to the discretized 
moment constraints 
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where we define pj = wj Pj and Uij = Xj . The entropy 
optimization program (EOP) can now be stated as to 
optimize the Lagrangian function 
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and the solution can be written as 
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^ Wjexp i^a^jfj, - l\ , j = l,2,...,n (4) 



Since w > 0, Eq.Q imphes that p > 0. Furthermore, 
the conditions in Eqs.(|2Jl and can be lumped together 
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fly Wj exp ^ akjfik - 1 - pii = 0, V i. 



(5) 

We now sec from Eq.Q that the original constrained 
optimization program is now reduced to an unconstrained 
convex optimization program involving the dual variables 



min d(r)) = V wj exp( V r], - 1) - V PtVi (6) 
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If the dual optimization program stated above has an 
optimal solution 77*, the solution Pj{r]*) can be obtained 
from Eq.Q. Bergman has proposed an iterative method 
to minimize the dual objective function d{rj) taking only 
one dual variable at a time . The method starts with 
an arbitrarily chosen £ i?™, and then cyclically up- 
dates all the dual variables as follows: 

Step 1: Start with any ij^ S and a sufficiently small 
tolerance level e > 0. Set k = and obtain p'j. 

Step 2: Let i = (k mod m) + 1. Solve the equation 



•/-f (A'^) = "^J^J exp(ay A'=) - ^i, = (7) 
Step 3: Update each component of rj 



fjf+' =vf + XHi^l^^), ryf+i-Tyf if/^* (8) 

Step 4: If Eq.© is satisfied within the preset level of 
tolerance, then stop with rj* = fj'' , and obtain the primal 
solution from Eq.Q. Otherwise, calculate 



exp(^a,,?7,f+i-l), j = l,2,...,n (9) 
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and go to Step 2 



From a computational point of view, the most prob- 
lematic part of the above algorithm is the solution of 
the set of Eq.© in Step 2. In a variant of the above 
scheme known as multiplicative algebraic reconstruction 
technique 0, 0| , one uses the following closed- form ex- 
pression to approximate the correction term A*^ 



Xf = In 
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Step 3 of the algorithm is now modified by substitut- 
ing the expression above for A*^ in Eq.||HJ). A conver- 
gence theorem for the modified algorithm can be found 
in Lent I* is, however, quite easy to see that the 

algorithm will fail unless for every z = 1, 2, ., m, either 
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We note that in this case we are assured of conver- 
gence of the solution of our discretized EOP because the 
condition (|ll|l holds. 

The EOP algorithm above can only be used provided 
that the condition stated by the inequality Hll|l or l|12() 
is satisfied. This constrains us to apply the algorithm for 
power moments but neither of these two are necessarily 
true for other polynomial moments. In order to work 
with Chebyshev polynomials, we first employ the aver- 
ages of shifted Chebyshev polynomials [l9j of the first 
kind T*{x) = Tn{2x — 1) to recast the entropy opti- 
mization program (EOP) given by statement |(3J). The 
only change needed for this purpose is to redefine by 
Oij = Tj {xj). 

Our next step is to find a transformation that will con- 
vert the EOP into an equivalent problem in which all the 
program parameters are non-negative. For finding the 
necessary transformation, we define for i = 1, 2, 3, ra. 



Uj = [max(— Cy)] + 1. 
j 



Obviously, for i — 1,2,3,..., m and j ~ 1, 2, 3, n, 

(ui + Oij) > 0. 
Let us now define for i = 1,2, 3, m. 



Mi = max(uj -I- ) ; ti = 
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It is easy to see that the following relations hold for 
i = 1, 2, 3, TO 



M, > 0, tj > 

(M, + l)tj^ —, t, {U^ + Qij) < t, M, < — 
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For 1 = 1,2, 3, TO and j = 1, 2, 3, n, let us define 

fljj = ti(ui + aij). (16) 

Apparently, for i — 1,2, 3, to and j = 1, 2, 3, n, we 
have 

mm 

- > fly > ; < ^ = ^ t,(M. + ay) < 1. (17) 
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It is interesting to note that if p is a feasible solution 
to the EOF involving averages of T*{x), then for i = 
1,2,3, ...,m 

n n 

'^a^JPJ ^^ti{ui + aij)pj =ti{ui + fii). (18) 
Hence, if we define for i = 1,2,3,..., m. 



ti (Ui + ^i) = ^ 
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It is easy to verify that 1/to > /i^ for i — 1,2,3, ...,to. 
The transformed EOF has thus the same form as previ- 
ously, except for the fact that we use Ea. H19|l in place of 
Eq.(|2Jl. Since both Oj^- and fi^ can take only positive val- 
ues, a feasible solution to the original program can now 
be obtained by replacing atj and /i^ j in Eq. (|SJ) by a, ■ and 



We consider two numerically difficult examples, a uni- 
form distribution and a double-delta function, to study 
the stability and accuracy of the algorithm. The Cheby- 
shev moments of these two functions can be exactly cal- 
culated. Earlier efforts to reproduce these distributions 
have met with limited success because of the difficulty 
in matching a sufficient number of moments and for the 
singular nature of the functions. It would be interesting 
to see how the algorithm performs in case of a) Uniform 
distribution f{x) = 1, x G [0,1] and b) a double-delta 
function g{x) = S{x - j) + S{x - |), x £ [0, 1]. 

The algorithm produces the uniform distribution cor- 
rectly up to five decimal places. We found that the first 
25 shifted Chebyshev moments are sufficient for this pur- 
pose. The fact that the end points have been produced so 
accurately without any spurious oscillations is a defini- 
tive strength of this approach and reflects the stability 
and accuracy of our algorithm. In figure Q we have plot- 
ted the result for the double-delta function. The result 
is equally convincing and certainly establishes the use- 
fulness of this method over the other existing ones in 
the literature. In addition to these examples, we have 
also tested our algorithm to reconstruct a Tent map, a 
semicircular distribution, a square-root distribution and 
a distribution with a gap in the spectrum. In all these 
cases, the algorithm correctly produces all the features 
of the distributions without failing. These results clearly 
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FIG. 1: Reconstruction of a double-delta function 
g(x) = 5(x — -j) + <5(x — I) from shifted Chebyshev moments. 
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FIG. 2: Normalized electronic density of states/eV (dotted 
line) of amorphous silica using the first 60 shifted Chebyshev 
moments. The distribution of energy eigenvalues (point) from 
direct diagonalization of the Hamiltonian matrix is also plot- 
ted in the figure. Normalized Fermi level is at 0.595 eV. 



demonstrate that the algorithm is very stable, accurate 
and is capable of producing some very uncommon dis- 
tributions (such as double-delta function) without any 
difficulty. 

We now consider a practical case where exact moments 
are not known but approximate moments are available. 
An archetypal example is the calculation of electronic 
density of states from its moments. In the context of 
solid state physics, maxent has been used profitably to 
calculate the density of electronic (vibrational) states 
from a knowledge of the moments of the Hamiltonian 
(Dynamical) matrix. The computation of moments it- 
self is an interesting problem in this field and there are 
methods available in the literature that specifically ad- 
dress this issue ^3 ■ Here one is interested in deter- 
mining physical quantities such as Fermi level and band 
energy of large systems (e.g. clusters, biological macro- 
molecules etc.) without diagonalizing the Hamiltonian 
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large number of moments [2l| (up to 500). The useful- 
ness of this algorithm is demonstrated by constructing 
some numerically difhcult distributions and applying it 
to amorphous silica to compute the electronic density of 
states and the Fermi level. 
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FIG. 3: Fermi level of amorphous silica as a function of 
number of the shifted Chebyshev moments. The value ob- 
tained from direct diagonalization of the Ifamiltonian matrix 
is -5.465 eV and is plotted as a horizontal line in the figure. 

matrix. For amorphous semiconductors, this is partic- 
ularly suitable because of disordered scattering (of elec- 
trons) that washes out the van Hove singularities in the 
electronic spectrum. A stable and accurate maxent algo- 
rithm, therefore, would be very useful in calculating elec- 
tronic properties of amorphous semiconductors. The two 
examples discussed above suggest that we should be able 
to produce complex electronic spectrum with a gap (or 
gaps) to a high degree of precision and hence the Fermi 
level and band energy. As for metallic systems, the de- 
termination of Fermi energy is a non-trivial problem for 
0(n) methods. A primary requirement for a maxent al- 
gorithm in this case is that 1) it must produce the dis- 
tribution accurately and 2) it must do so in a stable way 
using a sufficient number of moments to correctly pro- 
duce the singularities of the spectrum. It is very pleasing 
to note that our algorithm does satisfy this requirement 
and therefore may offer an alternative approach to com- 
pute Fermi energy of metallic systems. 

In figure EJ we have plotted the EDOS of amorphous 
silica using first 60 moments and compared it to the re- 
sult obtained by direct diagonalization of the Hamilto- 
nian matrix. It is clear from the figure that all the fea- 
tures of the EDOS are correctly produced by our maxent 
algorithm. Finally, in figure 01 we have plotted the varia- 
tion of Fermi energy with the number of moments. The 
Fermi energy is computed by integrating the normalized 
density of states to obtain the correct number of total 
electrons. It is clear from figure |2I that the Fermi energy 
starts to converge after first 30 moments and eventually 
converges after 40 moments. 

In conclusion, we present an algorithm for maximum 
entropy construction of a distribution from its moments. 
The algorithm is very stable, accurate and can handle a 
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